Pattern { Based Clustering for Database Attribute Values Matthew

نویسندگان

  • Matthew Merzbacher
  • Wesley W. Chu
چکیده

Pattern{Based Clustering for Database Attribute Values Matthew Merzbacher Wesley W. Chu Computer Science Department University of California Los Angeles, CA 90024 Abstract We present a method for automatically clustering similar attribute values in a database system spanning mulitple domains. The method constructs an attribute abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a con dence and popularity that combine to express the \usefullness" of the rule. Attribute values are clustered if they are used as the premise for rules with the same consequence. By iteratively applying the algorithm, a hierarchy of clusters can be found. The algorithm can be improved by allowing domain expert supervision during the clustering process. An example as well as experimental results from a large transportation database are included.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern-Based Clustering for Database Attribute Values

We present a method for automatically clustering similar attribute values in a database system spanning mulitple domains. The method constructs an aftribute abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a confidence and popularity that combine to express the "usefullness" of the rule. Attribute values are clustered if they are u...

متن کامل

A Database Model for Medical Consultation

The database model presented in this paper is suitable for application in which queries may require non-crisp references to certain attributes. The data item (attribute) values may be crisp or fuzzy. For instance, such adjectives as 'high' or 'normal' may be attribute values for the attribute blood pressure. A disease or a condition can be described by a number of symptoms which may be crisp al...

متن کامل

Separating indexes from data: a distributed scheme for secure database outsourcing

Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...

متن کامل

A Novel Technique for Pattern Extraction in Mixed Data

Knowledge discovery in databases or data mining is an important issue in the development of data and knowledge base system. The Self Organizing Map (SOM) is a vector quantization method which places the prototype vectors on a regular lowdimensional grid in an ordered fashion. Clustering data and extracting patterns from the clusters are very important tasks in data mining. An attribute-oriented...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993